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Abstract. In this article, we characterize in terms of analytic tableaux 
the repairs of inconsistent relational databases, that is databases that do 
not satisfy a given set of integrity constraints. For this purpose we provide 
closing and opening criteria for branches in tableaux that are built for 
database instances and their integrity constraints. We use the tableaux 
based characterization as a basis for consistent query answering, that is 
for retrieving from the database answers to queries that are consistent 
wrt the integrity constraints. 


1 Introduction 

The notion of consistent answer to a query posed to an inconsistent database 
was defined in [p]: A tuple is a consistent answer if it is an answer, in the usual 
sense, in every possible repair of the inconsistent database. A repair is a new 
database instance that satisfies the integrity constraints and differs from the 
original instance by a minimal set of changes wrt set inclusion. 

A computational methodology to obtain such consistent answers was also 
presented in |jj. Nevertheless, it has some limitations in terms of the syntactical 
form of integrity constraints and queries it can handle. In particular, it does not 
cover the case of existential queries and constraints. 

In classical logic, analytic tableaux || are used as a formal deductive system 
for propositional and predicate logic. Similar in spirit to resolution, but with 
some important methodological and practical differences |l8| , they are mainly 
used for producing formal refutations from a contradictory set of formulas. Start¬ 
ing from a set of formulas, the system produces a tree with formulas in its nodes. 
The set of formulas is inconsistent whenever all the branches in the tableau can 
be closed. A branch closes when it contains a formula and its negation. 

In this paper we extend the tableaux methodology to deal with a relational 
database instance plus a set of integrity constraints that the first fails to satisfy. 
Consequently, both inputs together can be considered as building an inconsistent 
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set of sentences. In this situation, we give criteria for closing branches in a tableau 
for a relational database instance. 

The technique of “opening tableaux” was introduced in [^3] for a solution to 
the frame problem, and in [05|j3Ci|] for applying tableaux methods to default logic. 
In this paper we show how to open tableaux for database instances plus their 
constraints, and this notion of opening is applied to characterize and represent 
by means of a tree structure all the repairs of the original database. Finally, 
we sketch how this representation could be used to retrieve consistent query 
answers. At least at the theoretical level, the methodology introduced in this 
paper could be applied to any kind of first order (FO) queries and constraints. 

This paper is organized as follows. In section^, we define our notion of repair 
of a inconsistent database. Section || recalls the definition of analytic tableaux 
and shows how databases and their repairs can be characterized as openings of 
closed tableaux. In section ^ we show the relationship between consistent query 
answering and Winslett’s approach to knowledge base update; this allows us 
to obtain some complexity results for our methodology. Section || shows how 
consistent answers to queries posed to an inconsistent database can be obtained 
using the analytic tableaux. In section ^ we show the relationship of consistent 
query answering with minimal entailment, more specifically, in section 3T, with 
circumscriptive reasoning. Thi s yields a method for implementing the approach, 
which is studied in section 6.2. 


2 Inconsistent Databases and Repairs 

In this paper a database instance is given by a finite set of finite relations on a 
database schema. A database schema can be represented in logic by a typed first- 
order language, C, containing a finite set of sorted database predicates and a fixed 
infinite set of constants D. The language contains a predicate for each database 
relation and the constants in D correspond to the elements in the database 
domain, that will be also denoted by D. That is every database instance has 
an infinite domain D. We also have a set of integrity constraints IC expressed 
in language C. These are first-order formulas which the database instances are 
expected to satisfy. In spite of this, there are realistic situations where a database 
may not satisfy its integrity constraints [Q. If a database instance satisfies IC , we 
say that it is consistent (wrt IC ), otherwise we say it is inconsistent. In any case, 
we will assume from now on that IC is a consistent set of first order sentences. 

A database instance r can be represented by a finite set of ground atoms 
in the database language, or alternatively, as a Herbrand structure over this 
language, with Herbrand domain D | p7| . In consequence, we can say that a 
database instance r is consistent, wrt IC , when its corresponding Herbrand 
structure is a model of IC , and we write r \= IC. 

The active domain of a database instance r is the set of those elements of D 
that explicitly appear (in the extensions of the database predicates) in r. The 
active domain is always finite and we denote it by Act(r). We may also have a 
set of built-in (or evaluable) predicates, like equality, arithmetical relations, etc. 
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In this case, we have the language C possibly extended with these predicates. In 
all database instances each of these predicates has a fixed and possibly infinite 
extension. Of course, since we defined database instances as finite sets of ground 
atoms, we are not considering these built-in atoms as members of database 
instances. 

In database applications, it is usually the case that an inconsistent database^ 
has “most” of its data contents still consistent wrt IC and can still provide 
“consistent answers” to queries posed to it. The notion of consistent answer was 
defined and analyzed in JIJ . This was done on the basis of considering all possible 
changes to r, in such a way that it becomes a consistent database instance. A 
consistent answer is an answer that can be retrieved from all those repairs that 
differ from the original instance in a minimal way. 

The notion of minimal change, defined in is based on the notion of minimal 
distance between models using symmetric set difference A of sets of database 
tuples. 

Definition 1. Q Given databases instances^ r, r' and r", we say that r' is 
closer to r than r" iff rAr' C rAr". This is denoted by r' < r r". □ 

It is easy to see that < r is an order relation. Only database predicates are 
taken into account for the notion of distance. This is because built-in predicates 
are not subject to change; and then they have the same extension in all database 
instances. Now we can define the “repairs” of an inconsistent database instance. 

Definition 2. |jj 

(a) Given database instances r and r' , r' is a repair of r, if r 1 2 \= IC and r' is 
a minimal element in the set of instances wrt the order < r . 

(b) Given a database instance r, a set IC and a first order query Q{x ), we say 

that a ground tuple t is a consistent answer to Q in r wrt IC iff r' \= Q[t\ for 
every repair r' of r (wrt IC). □ 

Example 1 . Consider the integrity constraint 

IC : Vx, y, z(Supply(x,y, z) A Class{z,T^) —> x = C), 

stating that C is the only provider of items of class X4; and the inconsistent 
database r = {Supply{C 1 D 1, It\), Supply{D , D 2 , /f 2 )> Class(Iti,Ti), Class(It2 , 
T4)}. We have only two possible (minimal) repairs of the original database in¬ 
stance, namely rq = {Supply(C, Di, Iti), Class{It\, X4), Class{It2, T4)} and 
r 2 = {Supply(C, Di, It±), Supply(D, D 2 , U2), Class{Iti,Ti)}. 

Given the query Q(x,y,z) : Supply(x,y, z) 7 , the tuple (C,Di,It-i) is a 
consistent answer because it can be obtained from every repair, but (D, D 2, /t 2 ) 
is not, because it cannot be retrieved from r 1 . □ 

1 Sometimes we will simply say “database” instead of “database instance”. 

2 We are assuming here and everywhere in the paper that all database instances have 
the same predicates and domain. 
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It is possible to prove [|lj] that for every database instance r and set IC of 
integrity constraints, there is always a repair r'. If r is already consistent, then 
r is the only repair. The following lemma, easy to prove, will be useful. 

Lemma 1. 

1. If r'< r r", then rflr"C r n r'. 

2. If r' C r, then rAr' = r\r'. □ 

We have given a semantic definition of consistent answer to a query in an 
inconsistent database. We would like to compute consistent answers, but via com¬ 
puting all possible repairs and checking answers in common in all of them. Ac¬ 
tually there may be an exponential number of repairs in the size of the database 

i- 

In fi].[y] a mechanism for computing and checking consistent query answers 
was considered. It does not produce/use the repairs, but it queries the only ex¬ 
plicitly available inconsistent database instance. Given a FO query Q , to obtain 
the consistent answers wrt a finite set of FO ICs /G, Q is qualified with appropri¬ 
ate information derived from the interaction between Q and IC. More precisely, 
if we want the consistent answers to Q(x) in r, the query is rewritten into a 
new query T(Q(a;)); and then the (ordinary) answers to T(Q(x)) are retrieved 
from r. 

Example 2. (example [I] continued) Consider the query Q : Supply(x,y, z)l 
about the items supplied together with their associated information. In or¬ 
der to obtain the consistent answers, the query E(Q) : Supply (x,y, z) A 
{Class{z,T±) —> x = C) is generated and posed to the original database. The 
extra conjunct in it is the “residue” obtained from the interaction between the 
query and the constraint. Residues can be obtained automatically jlj]. □ 

In general, T is an iterative operator. There are sufficient conditions on 
queries and ICs for soundness, completeness and termination of operator T; 
and natural and useful syntactical classes satisfy those conditions. There are 
some limitations though: T can not be applied to existential queries like 

Q(X) : 3Y Supplies(X,Y, Iti)?. However, this query does have consistent an¬ 
swers at the semantic level. Furthermore, the methodology presented in |I[ as¬ 
sumes that the ICs are (universal) constraints written in clausal form. 

There are fundamental reasons for the limitations of the query rewriting 
approach. If a FO query can be always rewritten into a new FO query, then the 
problem of consistent query answering (CQA) would have polynomial time data 
complexity. From the results in this paper (see also |h|), we will see that CQA 
is likely to have a higher computational complexity. 

Notice that T is based on the interaction between the queries and the ICs. 
It does not consider the interaction between the ICs and the database instance. 
In this paper we concentrate mostly on this second form of interaction. In par¬ 
ticular, we wonder if we can obtain an implicit and compact representation of 
the database repairs. 
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Furthermore, the database seen as a set of logical formulas plus IC is an 
inconsistent first order theory; and we know that such an inconsistency can be 
detected and represented by means of an analytic tableau. 

An analytic tableau is a syntactically generated tree-like structure that, start¬ 
ing from a set of formulas placed at the root, has all its branches “closed” when 
the initial set of formulas is inconsistent. This tableaux can show us how to repair 
inconsistencies, because closed branches can be opened by removing literals. 

In this work, we show how to generate, close and open tableaux for database 
instances with their constraints; and we apply the notion of opening to charac¬ 
terize and represent by means of a tree structure all the repairs of the original 
database. Finally, we sketch how this representation could be used to retrieve 
consistent query answers. At least at the theoretical level, the methodology in¬ 
troduced here could be applied to any kind of first order queries and constraints. 


3 Database Repairs and Analytic Tableaux 

In order to use analytic tableaux to represent database repairs and character¬ 
ize consistent query answers, we need a special form of tableaux, suitable for 
representing database instances and their integrity constraints. 

Given a database instance r and a finite set of integrity constraints IC 1 , we 
first compute the tableau, TP(IC U r), for IC and r. This tableau has as root 
node the set of formulas IC U r. This tableau should be closed, that is the 
tableau has only closed branches, if and only if r is inconsistent. By removing 
database literals in every closed branch we can transform r into a consistent 
database instance and thus obtain a repair of the database. For all this to work, 
we must take into account, when computing the tableau, that r represents a 
database instance and not just a set of formulas, in particular, that the absence 
of positive information means negative information, etc. (see section O). Next, 
we give a brief review of classical first order analytic tableaux l!» 


3.1 Analytic tableaux 

The tableau of a set of formulas is obtained by recursively breaking down the 
formulas into subformulas, obtaining sets of sets of formulas. These are the usual 
Smullyan’s classes of formulas: 
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A tableaux prover produces a formula tree. An a-rule adds new formulas to 
branches, a /3-rule splits the tableau and adds a new branch. Given a formula ip, 
we denote by TP(ip) the tree produced by the tableaux system. We can think 
of this tree as the set of its branches, that we usually denote with A', Y,.... 

Notice that the original set of constants in the language, in our case, D, 
is extended with a set of new constants, P, the so-called Skolem functions or 
parameters. These parameters, that we will denote by p,pi,..., have to be new 
at the point of their introduction in the tree in the sense that they have not 
appeared so far in the (same branch of the) tableau. When applying the 7 -rule, 
the parameter can be any of the old or new constants. 

A tableau branch is closed if it contains a formula and its negation, otherwise 
it is open. Every open branch corresponds to a model of the formula: If a branch 
B £ TP {ip) is open and finished, then the set of ground atoms on B is a model of 
p. If the set of initial formulas is inconsistent, it does not have models, and then 
all branches (and thus the tableau) have to be closed. Actually, the completeness 
theorem for tableaux theorem proving |nj states that: F is a theorem iff 
TP{{-<F}) is closed. 

The intuitive idea of finished branch, of one to which no tableaux rule can 
be applied obtaining something new and relevant, is captured by means of the 
notion of saturated branch : this is a branch where all possible rules have been 
applied. 

Definition 3. A branch B is saturated iff it satisfies 

1. If —1 *ip £ B , then ip £ B 

2. If {ip V ip) £ B, then <p £ B or ip £ B 

3. If {p A ip) £ B, then ip £ B and ip € B 

4. If 3 xip £ B, then p[c\ £ B for some constant c 

5. If Mxp £ B, then ip[c\ £ B for any constant c.^| □ 

A branch is called Hintikka if it is saturated and not closed |H)). It is easy to 
see that a saturated branch is Hintikka iff it does not contain any atomic formula 
A and its negation -1 A. From now on, tableaux branches will be assumed to be 
saturated. Nevertheless, sometimes we talk about branches even when they are 
partially developed only. 

We consider TP not only as a theorem prover (or consistency checker) for 
formulae but also as an application from (sets of) formulas to trees which has 
some useful properties. Thus, operations on tableaux can be defined on the basis 
of the logical connectives occurring inside the formulas involved. 

Lemma 2. Let <p and ip be any formulae. Then TP has the following properties. 

1. TP{{p V iP}) = TP{ M) U TP{{iP}) 

2. TP{{p A ip}) = {X U Y : X £ TP{{p}) and Y £ TP{{iP})} 

3 If the language had function symbols, we would have replace constants by ground 
terms in this definition. 
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3. If B G TP(tp A 0) then B = B' U B" and B' G TP (ip) and B" G TP( 0). □ 

Property 3. follows directly from properties 1. and 2. The properties in the 
lemma motivate the following definition. 

Definition 4. Given tableaux T and T', each of them identified with the set of 
its branches, the combined tableaux is T ®T' = {X U Y : X G T and Y G T'}. 
□ 

Remark 1. The properties in lemma can be used to check whether a formula 
(p derives from a theory A. A \= <p iff (A —> ip) is a theorem, what will be proved 
if we derive a contradiction from assuming ->(A —► ip). Therefore we will have 
to compute TP({-i(A —> </?)}) and check for closure. Using the second property, 
we will check TP({A}) <g> TP ({-up}) for closure, allowing us to compute TP (A) 
only once for any number of requests. □ 

The following relationship between the open branches of the tableaux for a 
formula and the its models has been shown, among others by ]7|j3^|. 

Theorem 1. Let B G TP ({0}) be an open branch of the tableau for 0. Then 
there is a model M of 0, which satisfies B , i.e. B C M. More precisely, there is 
Herbrand model of p such that the ground atoms in B belong to M. □ 


3.2 Representing database instances by tableaux 

In database theory, we usually make the following assumptions^: (a) Unique 
Names Assumption (UNA): If a and b are different constants in D, then a ^ b 
holds in r. (b) Closed World Assumption (CWA): If r is a database instance, 
then for any ground database atom P(c), if P(c) 0 r, then -iP(c) holds for r, 
more precisely, implicitly ->P(c) belongs to r. 

In consequence, if we see the relational database as the set of its explicit 
atoms plus its implicit negative atoms, we can always repair the database by 
removing ground database literals. 

When computing a tableau for a database instance r, we do not add explicitly 
the formulas corresponding to the UNA and CWA, rather we keep them implicit, 
but taking them into account when computing the tableau. This means, for 
example, that the presence on a tableau branch of a formula a = b, for different 
constants a, b in D , closes the branch. 

Given a database r and integrity constraints IC, we will generate the tableau 
TP{IC Ur). Notice that every branch B of this tableau will be of the form / Ur, 
where I G TP(IC) (see lemma [^) . / is the “IC- part” of the branch. 

Notice also that a tableau for IC only will never be closed, because IC is 
consistent. The same happens with any tableau for r. Only the combination of 
r and IC may produce a closed tableau. 

4 Actually, it is possible to make all these assumptions explicit and transform the 
database instance into a first-order theory ji^]. 
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TP{IC Ur) is defined as in section 3.1, but we still have to define the closure 
conditions for tableaux associated to database instances. Before, we present some 
motivating examples. 


Example 3. (example [j] continued) In this case, TP(IC U r) is the tree in figure 
[j]. The last branch is closed because D = C is false in the database (alternatively, 


\/x,y, z(Supply(x, y, z) A Class(z,T 4 ) — * x = C) 
Supply(D,D 2 ,It 2 ) 

Supply(C, Di, Iti) 

Class{It\,Ti ) 

Class(It 2 , T&) 


Supply(C,Di,Iti) A Class(It\,Ti) —> C = C 


Supply ( D, D 2 , It 2 ) A Class(It 2 ,TA —> D = C 



-<Supply(C, Di, Iti) -'Class{It\,Ti) 


C = C 


x 


x 



^Supply(D, D 2l It 2 ) ~^Class{It 2 ,T±) D = C 
x xx 


Fig. 1. Tableau for Example |3j 


because D ^ C is implicitly in the database). We can see that TP{IC U r) is 
closed, r is inconsistent wrt IC. The nodes (Supply(C, D i, Iti) A Class{Iti,T±) 
—> C = C) and (Supply(D, D 2 , It 2 ) A Class(It 2 ,Ti) —> D = C) are obtained 
by applying the 7 -rule to Vie, y, z(Supply(x, y, z) A Class{z,Ti) —> x = C). Ap¬ 
plication of the /?-rule to ( Supply(D , D 2 ,It 2 ) A Class(It, 2 . T 4 ) — > D = C) pro¬ 
duces the same subtree for all three leaves: Supply(C, Di, It±), -^Class{Iti,Ti) 
and C = C. In the figure, we indicate this subtree by “... ”. We will see later (see 
section 3.3) that, in some cases, we can omit the development of subtrees that 


should develop under branches that are already closed. Here we can omit the 
explicit further development of the subtree from the first two leftmost branches, 
because these branches are already closed. □ 
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In tableaux with equality, we need extra rules. We will assume that we can 
always introduce equalities of the form t = t, for a term t, and that we can replace 
a term t in a predicate P by t! whenever t = t' belongs to the same tableau branch 
(paramodulation, |l9|] ). It will be simpler to define the closure rules for database 
tableaux, if we skolemize existential formulas before developing the tableau Q. 
We assume from now on that all integrity constraints are skolemized by means 
of a set of Skolem constants (the parameters in P) and new function symbols. 

Example 4- Consider the referential IC : Vx (P(x) —► 3 y Q(x,y)), and the 
inconsistent database instance r = {P(a) 1 Q(b, d)}, for a,b,c £ D. With an 
initial skolemization, we can develop the following tableau TP{IC Ur). In this 
tableau, the second branch closes because Q(a,f(a)) does not belong to the 
database instance. There is no x in the active database domain, such that r 
contains Q(a, x). Implicitly, by the CWA, r contains then -iQ(a, x) for any x. 
Hence the branch containing Q(a, /(a)) closes and r is inconsistent for IC. 

Vx (P(x) -> Q(x, f(x))) 

P{a),Q(b,d) 



x x 


Example 5. Consider the inconsistent database r\ = {Q(a),Q(6)} wrt the IC: 
3x P(x). After having skolemized 3x P(x) into P(p), a tableau proof for the 
inconsistency is the following 


P(P) 

Q{a),Q(b) 

x 

This branch closes because there is no x in D such that P(x) €E r and therefore 
-P(x) belongs to r for any x in D. P(p) cannot belong to this database. □ 

Example 6. Let us now change the database instance in example || to = 
{P(a),P(b)}, keeping the integrity constraint. Now, the database is consistent, 
and we have the following tableau TP(IC U r^): 

P(P) 

P(a),P(b) 

This time we do not want the tableau to close, and thus sanctioning the 
inconsistency of the database. The reason is that we could make p take any of 
the values in the active domain {a, 6} C D of the database. □ 
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A similar situation can be found in a modified version of example 

Example 7. Change the database instance in example || to (P(a), Q(a, d)}. Now 
it is consistent wrt the same IC. We obtain 

Va: (P(x) -*• Q(x,f{x))) 

P(a), Q(a, d) 



Now we do not close the rightmost branch because we may define / as a 
function from the active domain into itself that makes Q(a, f(a)) become a 
member of the database, actually by defining /(a) = d. □ 

Example 8. Consider IC : 3x -P(x) and the consistent database instance r = 
(P(a)}. The tableau TP(IC U r) after skolemization of IC is: 

-P(P) 

P(a) 

This tableau cannot be closed, because p must be a new parameter, not 
occurring in the same branch of the tableau and it is not the case that P(p) G r 
(alternatively, we may think of p as a constant that can be defined as any element 
in D \ {a}, that is in the complement of the active domain of the database). □ 

In general, a tableau branch closes whenever it contains a formula and its 
negation. However, in our case, it is necessary to take into account that not 
all literals are explicit on branches due to the UNA and CWA. The following 
definition of closed branch modifies the standard definition, and considers those 
assumptions. 

Definition 5. Let B be a tableau branch for a database instance r with integrity 
constraints IC, say B = /Ur. B is closed iff one of the following conditions holds: 

1. a = b G B for different constants a, b in D. 

2. (a) P(c) G I and P(c) 0 r, for a ground tuple c containing elements of D 

only. 

(b) P(c) G I and there is no substitution cr for the parameters in c such that 
P(c)cr G r|] 

5 A substitution is given as a pair cr = ( p,t ), where p is a variable (parameter) and t 
is a term. The result of applying a to formula F, noted Fa, is the formula obtained 
by replacing every occurrence of p in F by t. 




Database Repairs and Analytic Tableaux 


11 


3. ->P(c) £ I and P(c) £ r for a ground tuple c containing elements of D only. 

4. ip £ B and ->< p £ B , for an arbitrary formula tp. 

5. -i t = t £ B for any term t. □ 

Condition 1. takes UNA into account. Notice that it is restricted to database 
constants, so that it does not apply to new parameters^. Condition 2(a) takes 
CWA into account. Alternative condition 2(b) (actually it subsumes 2(a)) gives 
an account of examples 0-10 and 0 

In condition 3. one might miss a second alternative as in condition 2., some¬ 
thing like “-iP(c) £ I for a ground tuple containing Skolern symbols, when there 
is no way to define them considering elements of D \ Act(r ) in such a way that 
P(c) (ji r”. This condition can be never satisfied because we have an infinite 
database domain D, but a finite active domain Act(r). So, it will never apply. 
This gives an account of example |j. Conditions 4. and 5. are the usual closure 
conditions. Conditions 2(a) and 3. are special cases of 4. 

Now we can state the main properties of tableaux for database instances and 
their integrity constraints. 

Proposition 1 . For a database instance r and integrity constraints IC, it holds: 

1. r is inconsistent wrt to IC iff the tableau TP(IC Ur) is closed (i.e. each 
of its branches is closed). 

2. TP(IC U r) is closed iff r does not satisfy IC (i.e. r IC). □ 

3.3 Opening tableaux 

The inconsistency of a database r wrt IC is characterized by a tableau TP{IC U 
r) which has only closed branches. In order to obtain a repair of r, we may remove 
the literals in the branches which are “responsible” for the inconsistencies, even 
implicit literals corresponding to the CWA. Every branch which can be “opened” 
in this way will possibly yield a repair. We can only repair inconsistencies due to 
literals in r. We cannot remove literals in I because, according to our approach, 
integrity constraints are rigid, we are not willing to give them up; we only allow 
changes in the database instances. We cannot suppress equalities a = b neither 
built-in predicates. 

Remark 2. According to Definition ||, we can repair inconsistencies due only to 
cases 2. and 3. More precisely, given a closed branch B in TP{IC U r): 

1. If B is closed because of the CWA, it can be opened by inserting erP(c) into 
r, or, equivalently, by removing the implicit literal —>aP(c) from r for any 
substitution cr from the parameters into D (case 2(b) in Def. |^). 

2. If B is closed because of contradictory literals -P(c) £ I and P(c) £ r, then 

it can be opened by removing P(c) from r (case 3 in Def. JsJ) . □ 

6 That is, elements of P are treated as null values in Reiter’s logical reconstruction of 
relational databases [^3|. 
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Example 9. (example ||] continued) The tableau has 9 closed branches: (we dis¬ 
play the literals within the branches only) 


B i 

b 2 

b 3 

Supply(C, Di.Iti) 

Supply(C, Di, It\) 

Supply(C, Di, Iti) 

Supply(D,D 2 ,It 2 ) 

Supply(D, D 2 , It 2 ) 

Supply(D, D 2 , It 2 ) 

C7ass(/fi, T 4 ) 

Class(Iti,T 4 ) 

Class(Iti,T 4 ) 

Class(It 2 , T 4 ) 

Class(It 2 ,T 4 ) 

Class\lt 2 l T 4 ) 

->Supply(C, D\, It\) 

~^Supply(C 1 D\, Iti) 

->Supply(C,D 1 ,It 1 ) 

-i Supply(D 1 D 2 , It 2 ) 

-i Class(It 2 , T 4 ) 

D = C 

B 4 

b 5 

B 6 

Supply(C, Di,Iti) 

Supply(C, Di, Iti) 

Supply{C,Di,Iti) 

Supply(D, D 2 , It 2 ) 

Supply(D, D 2 , It 2 ) 

Supply(D, D 2 , It 2 ) 

Class{It\,T ±) 

Class{It\,T 4 ) 

Class(Iti,Ti) 

Class{lt 2 , T 4 ) 

Class(It 2 , T 4 ) 

Class(It 2 , T 4 ) 

~^Class(Iti, T 4 ) 

-iC7ass(/ti, T 4 ) 

~^Class(Iti, X 4 ) 

~^Supply(D, D 2l It 2 ) 

~^Class(It 2 , T 4 ) 

D = C 

b 7 

b 8 

b 9 

Supply(C, D 1 ,It 1 ) 

Supply{C, Di, Iti) 

Supply{C,Di,Iti) 

Supply(D, D- 2 , It 2 ) 

Supply(D, D 2 , It 2 ) 

Supply(D,D 2 ,It 2 ) 

Class(Iti,T± ) 

Class(Iti , X 4 ) 

Class(Iti , T 4 ) 

Class{lt 2 , T 4 ) 

Class(It 2 , T 4 ) 

Class(It 2 , T 4 ) 

C = C 

C = C 

c = c 

~^Supply(D, D 2l It 2 ) 

~^Class(It 2 , T 4 ) 

D = C 

The first four tuples 

in every branch correspond to the initial instance 

,ch branch Bi consists of an I- part and the r 

■-part, say Bi = rUt And 

h 

I 2 

I 3 

~^Supply{C, Di, It\) 

~^Supply(C, Dx,Iti) 

~^Supply(C, Di, Iti) 

-iSupply(D, D 2 , It 2 ) 

-<Class(It 2 ,T 4 ) 

D = C 

h 

h 

h 

-iC7ass(/ii, T 4 ) 

~^Class(Iti, T 4 ) 

-<Class(It\,T 4 ) 

~^Supply(D, D 2l It 2 ) 

~^Class(It 2 , T 4 ) 

D = C 

I 7 

h 

h 

C = C 

C = C 

C = C 

~^Supply(D, D 2 , It 2 ) 

-iClass(It 2 , T 4 ) 

D = C 


In order to open this closed tableau, we can remove literals in the closed 
branches. Since a tableau is open whenever it has an open branch, each opened 
branch of the closed tableau might produce one possible transformed open 
tableau. Since we want to modify the database r, which should become con- 
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sistent, we should try to remove a minimal set of literals in the r-part of the 
branches in order to open the tableau. This automatically excludes branches 
and Bg , because they close due to the literals D = C, which do not 
correspond to database literals, but come from the constraints. 

In this example we observe that the sets of database literals of some of the 
Ij are included in others. Let us denote by /' the set of literals in Ij that are 
database literals (i.e. not built-in literals), e.g. I[ = I\, I' 7 = {->Supply (D, Dg, 
It 2 )}. We have then I[ D I 7 . I' 2 D /§, Ig D Ig , I' 4 D /}, 1 5 D /§, Ig D !')■ This 
shows, for example, that in order to open B\, we have to remove from r a superset 
of the set of literals that have to be removed from r for opening B 7 . Hence, we 
can decide that the branches whose database part contains the database part 
of another branch can be ignored because they will not produce any (minimal) 
repairs. This allows us not to consider B 7 through Bg in our example, and B 7 
and Bg are the only branches that can lead us to repairs. □ 

The following lemma tells us that we can ignore branches with subsumed 
I- parts, because those branches cannot become repairs. 

Lemma 3. If r" C r’ C r, then r' < r r". □ 

Moreover, as illustrated above, where the tableau tree is shown, sometimes we 
can detect possible subsuming branches without fully developing the tableau. In 
example the first formula has been split by a tableau rule and we have already 
closed two branches. When we apply another rule, we know then, that the branch 
C = C, which is not closed yet, will be not be closed or will be closed by a subset 
of the database literals appearing in the first two branches. 

Definition 6 . Let B = I U r be a closed branch of the tableau TP{IC U r). 

(a) If / is not closed, i.e the branch is closed due to database literals only, we 
say that B is data closed. 

(b) Let B = I U r be a data closed branch in the tableau TP{IC Ur), we define 
op(B) := (r \ L(B )) U K(B ), where 

1. L(B) = {l | l £ r and -il € 1} 

2. K{B) = t{1 I 2 is a ground atom in I and there is no substitution a such 
that Icr Sr}, where r is any substitution of the parameters into D. 

(c) An instance r' is called an opening of r iff r' = op(B) for a data closed 

branch B in TP{IC Ur). □ 

If the branch B is clear from the context, we simply write r 1 = (r\L) U K. 
If no parameters have been introduced in the branch, then we do not need to 
consider the substitutions above. In this case, for an opening I U r' of a branch 
I U r it holds:(a) If P(c) S / and P{c) 0 r, then P(c) S r'. (b) If -iP(c) S I and 
P(c) S r, then P(c) r'. Notice that we only open branches which are closed 
because of conflicting database literals. 

When r |= IC , then TP(IC U r) will have (finished) open branches B. For 
any of those branches op(B) can be defined exactly as in Definition £s| It is easy 
to verify that in this case op(B) coincides with the original instance r. 


14 


Bertossi and Schwind 


Proposition 2. Let r' be an opening of r. Then r' is consistent with /C, i.e. 
r' \= IC. □ 


Example 10. Consider r = {P(a), Q(a), R(b)} and IC = (Va:(P(a:) —► Q(x)}. 
Here r \= IC and TP(IC U r) is 


P(a),Q{a),R(b) 
P(x) —► Q(x) 



The first branch, B±, is closed and op{B\) = {Q(a), R(b)} that satisfies IC. The 
second branch, P 2 , is open and op(B 2 ) = r. The third branch, B 3 , is closed 
and op(B 3 ) = {P(a),Q(a),Q(b),R(b)} that satisfies IC. Notice that we could 
further develop the last node there, obtaining the same tree that is hanging from 
- \P(b ) in the tree on the LHS. If we do this, we obtain closed branches P 4 , B 3 , 
with op(I? 4 ) = {Q(a),Q(b), R{b)}, and op(B 5 ) = {P(a), Q(a),Q(b) 1 R(b)}. With 
these last two openings we do not get any closer to r than with op(B 3 ), that is 
still not as close to r as the only repair, r, obtained with branch B 2 . □ 


Example 11. Consider IC as in example [To], but now r = {P(a), R(b)}, that does 
not satisfy IC. TP(IC U r) is 

P(a),R(b) 

P{x) —> Q(x) 


Qib) 

x 

B 3 

~^P{a) Q(a ) 
x x 

Pi b 2 



For the first branch B±, we obtain op(B 1 ) = (P(&)}, that is a repair. Branch B 2 
gives op(B 2 ) = {P(a), R(b),Q(a)}, the other repair. 

For the closed branch P 3 we have op(B 3 ) = {P(o), Q(b), R{b)}. This is not 
a model of IC , apparently contradicting Proposition |], in particular, it is not a 
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repair of r. If we keep developing node Q(b) exactly as ~^P{b) on the LHS, we 
obtain extended (closed) branches, with associated instances {Q(b), R{b)} and 
{P(a), Q{a), Qib), R(b)}. Both of them satisfy IC, but are non minimal; and 
then they are not repairs of r. This example shows the importance of having the 
open and closed branches (maybe not explicitly) saturated (see Definition 

We can see that every opening is related to a possibly non minimal repair 
of the original database instance^. For repairs, we are only interested in “mini¬ 
mally” opened branches, i.e. in open branches which are as close as possible to 
r. In consequence, we may define a minimal opening r' as an opening such that 
rAr' is minimal under set inclusion. 

Openings of r are obtained by deletion of literals from r, or, equivalently, 
by deletion/insertion of atoms from/into r. In order to obtain minimal repairs, 
we have to make a minimal set of changes, therefore we do not keep openings 
associated to an r", such that r' Ar C r " Ar , where r' is associated to another 
opening. We will show subsequently that these are the openings where L and K 
are minimal in the sense of set inclusion wrt all other openings in the same tree. 

The following theorem establishes a relationship between the order of repairs 
defined in Definition [l] and the set inclusion of the database atoms that have 
been inserted or deleted when opening a database instance. 

Lemma 4. For any opening r' — (r\L) U K , we have rAr' = LU K. □ 

Proposition 3. Let r i = (r\Ai)UAb and r 2 = (r\A 2 )U A 2 . Then r\ is closer 
to r than 7"2, i.e. r\ < r r 2 iff L\ C A 2 and K\ C AT 2 . □ 

Theorem 2. Let r be an inconsistent database wrt IC. Then r' is a repair of 
r iff there is an open branch I of TP(IC), such that I Ur is closed and I U r' is 
a minimal opening of I U r in TP(IC U r). □ 

Example 12. (example^ continued) TP{IC Ur) has two minimal openings: 

r 7 r 8 

Supply(C,D 1 ,I 1 ) Supply(C,Di,I{) 

Class{I\,Ti ) Class(Ii, T 4 ) 

Classic, T 4 ) Supply(D, D 2 , h) 

The rightmost closed branch cannot be opened because it is closed by the atom 
D = C which is not a database predicate. □ 


7 Strictly speaking, we should not say “non minimal repair”, because repairs are min¬ 
imal by definition. Instead, we should talk of database instances that differ from the 
original one and satisfy the ICs. In any case, we think there should be no confusion 
if we relax the language in this sense. 
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4 Repairs, Knowledge Base Updates and Complexity 


Our definition of repairs is based on a minimal distance function as used by 
Winslett for knowledge base update More precisely, Winslett in her “pos¬ 
sible models approach” defines the knowledge base change operator o for the 
update of a propositional knowledge base K by a propositional formula p by 

Mod(Kop) = {to' G Mod(p ) : toAto' G Minc{{mAm' : m' G Mod(p)})} 

m£Mod(K) 

In |T(i| ] , Eiter and Gottlob present complexity results for propositional knowl¬ 
edge base revision and update. According to these results, Winslett’s update op¬ 
erator is on the second level of the polynomial hierarchy in the general case (i.e. 
without any syntactic restriction on the propositional formulas): the problem of 
deciding whether a formula q is a logical consequence of the update by p of a 
knowledge base T is iXf —complete. 


Update 

General case 

General case 

Horn 

Horn 


arbitrary p 

II P II < k 

arbitrary p 

II P ll< k 

T op —> q 

11 % —complete 

co-NP-complete 

co-NP-complete 

0(11 T || • || 9 ||) 


In the above table, we resume the results reported in jl(|. The table contains 
five columns. In the general case (columns two and three), T is a general propo¬ 
sitional knowledge base. In the Horn-case (columns four and five), it is assumed 
that p and q and all formulas in T are conjunctions of Horn-clauses. Columns 
two and four account for cases where no bound is imposed on the length of 
the update formula p, while columns three and five describe the case where the 
length of p is bounded by a constant k. The table illustrates that the general 
problem in the worst case (arbitrary propositional formulas without bound on 
the size) is intractable, whereas it becomes very well tractable (linear in the size 
of T and query q) in the case of Horn formulas with bounded size. 

How are these results related to CQA? If r is a database which is inconsistent 
with respect to the set of integrity constraints IC, the derivation of a consistent 
answer to a query Q from r corresponds to the derivation of Q from the data 
base r updated by the integrity constraints IC. Hence, the (inconsistent) knowl¬ 
edge base instance r, which is just a conjunction of literals, corresponds to the 
propositional knowledge base T. The integrity constraints IC correspond to the 
update formula p And deriving an answer to query Q from r (and IC) corre¬ 
sponds to the derivation of Q from r updated by IC. 

Update is defined for propositional formulas. Update is defined by means of 
models of the knowledge base r and the update formula IC. In our case, r is a 
finite conjunction of grounded literals, i.e. r is a propositional Horn formula. The 
update formulas however (integrity constraints IC) are FO formulas. However, 
the Herbrand universe of the database is a finite set of constants. Therefore, we 
can consider instead of IC the finite set of instantiations of the formulas in IC 
by database constants. Let us denote the conjunction of these instantiations by 
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ic. Note that ic is Horn whenever all formulas in IC are Horn, what is common 
for database ICs. 

It is then easy to see that the following relationship holds between update 
and repairs and CQA. It follows straightforwardly from the definitions of repairs 
and update. 

Theorem 3. Given a database instance r and a set of integrity constraints IC 
with their propositional database representation ic: 

(a) r' is a repair of r wrt IC iff r' € Mod(r o ic). 

(b) If Q is a query, t is a consistent answer to Q wrt IC iff every model of r o ic is 

a model of Q(t), i.e. Mod(r o ic) C Mod(Q(t)). □ 

In consequence, the results given by Eiter and Gottlob apply directly to CQA. 

The number of branches of a fully developed tableaux is very high: in the 
worst case, it contains o( 2 ") branches where n is the length of the formula. 
Moreover, we have to find minimal elements within this exponential set, what 
increases the complexity. Theorem |] tells us that we do not need to com¬ 
pare the entire branches but only parts of them, namely the literals which 
have been removed in order to open the tableau. This reduces the size of the 
sets we have to compare, but not their number. Let us reconsider in exam¬ 
ple H the point just before applying the tableaux rule which develops formula 
Supply{D 1 D -2 , 12 ) A Class(I 2 , I 4 ) —> D = C. As we pointed out in the discus¬ 
sion of example |3j, under some conditions, it is possible to avoid the development 
of closed branches because we know in advance, without developing them, that 
they will not be minimal. 

Example 13. (example || continued) In this case, TP(ICUr) is the tree in Figure 
|j. This tree has two closed branches, B\ and B 2 , and one open branch B 3 . 
Each of these branches will receive an identical subtree due to the application 
of the tableaux rules to the formulas not yet developed on the tree, namely 
(, Supply(D , D 2 , It 2 ) A Class(It 2 , T 4 ) —> D = C). We know at this stage of the 
development that B\ is closed due to ->Supply(C, D\, Iti) and B 2 is closed due 
to -iClass(Iti,T^)\ B 3 is not closed. □ 

In this example, we can see that if we further develop the tree, every Bi will 
have the same sets of sub-branches, say L 1 , L 2 , ■ ■ •, where Li is a set of literals. 
The final fully developed tableau will then consist of the branches B\ U Ii, 

B\ U L 2 , ..., B 2 UL 1 , B 2 UQ, ■ • ■ B 3 UL 1 , B 3 UL 2 , .. •, _If the final tableau 

is closed, since B 3 is not closed, every B 3 L)Lj will be closed due to literals within 
Lj, say Kj. 

We have then two cases: either the literals in Kj close due to literals in r 
(which is the original inconsistent database instance) or they close due to literals 
in the part of B 3 not in r. In the first case, these literals from Kj will close every 
branch of the tree (also B\ and B 2 ). Since B\ and B 2 were already closed, 
they will be closed due to a set of literals that is strictly bigger than before, 
and therefore they will not produce minimally closed branches (and no repairs). 
In this situation, those branches can immediately be ignored and not further 
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developed. This can considerably reduce the size of the tableau. In this example, 
at the end of the development, only B 3 will produce repairs (see example ^]). 

In the second case, the literals in K 3 close due to literals in the part of B 3 that 
are not in r. If these literals are not database literals (we have called them built- 
in predicates), the branch cannot be opened, we cannot repair inconsistencies 
that are not due to database instances. Then, we only have to consider the case 
of database literals that are not in r. 


\/x,y, z(Supply(x, y, z) A C7ass(z, T 4 ) —> x = C) 
Supply(D 1 D 2l It 2 ) 

Supply(C, D±, It\) 

Class(Iti, T 4 ) 

Class(It 2: T 4 ) 


Supply(C, Di, It{) A Class(Iti,Ti) —> C = C 


Supply(D, D 2 , It 2 ) A Class(It 2 ,T 4 ,) —> D = C 



Fig. 2. 


Since B 3 is open, those literals are negative literals (in the other case, B 3 
would not have been open, due to condition 2. in Definition ||). This is the 
only situation where the sub-branches which are closed at a previous point of 
development may still become minimal. In consequence, a reasonable heuristics 
will be to suspend the explicit development of already closed branches unless we 
are sure that this case will not occur. 

5 Consistent Query Answering 

In order to determine consistent answers to queries, we can also use, at least 
at the theoretical level, a tableaux theorem prover to produce TP{IC U r) and 
its openings. Let us denote by op(TP(IC U r)) the tableau TP(IC U r), with 
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its minimal openings: All branches which cannot be opened or which cannot be 
minimally opened are pruned and all branches which can be minimally opened 
are kept (and opened). (We reconsider this priming process in section |6.2| .) 

According to Definition and Theorem |], t is a consistent answer to the 
open query Q(x ) when the combined tableau op(TP(IC Ur)) (g) TP(^Q(t)) (c.f. 
Definition^) is, again, a closed tableau. In consequence, we might use the tableau 
op(TP(IC Ur))(g) TP(^Q(x)) in order to retrieve those values for x that restore 
the closure of all the opened branches in the tableau. 

Example 14- Consider the functional dependency 

IC : \/(x,y,z,u,v)(Student(x,y,z)AStudent(x,u,v) —> y = uAz = v)] 

and the inconsistent students database instance 

r = {Student{Si 1 Ni,D±), Student{Si 1 N 2 , Th), Course(S \, Ci, Gi), 
Course(S 1 ,C 2 ,G 2 )}, 

which has the two repairs, namely 


r 1 = 1 Stud.pntt Si 






r 2 = {Student(S\, N 2 , Di),Course(S\,Ci,Gi),Course(Si,C 2 ,G 2 )}. 

We can distinguish two kinds of queries. The first one corresponds to a first 
order formula containing free variables (not quantified), and then expects a 
(set of database) tuple(s) as answer. For example, we want the consistent an¬ 
swers to the query ll Course(x,y, z)7 ". Here we have that op(TP(IC U r)) (g) 
TP(-iCourse(x,y, z)) is closed for the tuples (Si,Gi,Gi) and (Si, G 2 , G 2 ). 

A second kind of queries corresponds to queries without free variables, i.e. to 
sentences. They should get the answer “yes” or “no”. For example, consider the 
query “Course(Si, G 2 ,G 2 )?”. Here op{TP{ICGr))®TP{ -^Course(Si,C 2 ,G 2 )) 
is closed. The answer is “yes”, meaning that the sentence is true in all repairs. 

Now, consider the query ll Student(Si, N 2 , D\)T'. The tableau op(TP(IC U 
r)) (g> TPStudent[S 1 , N 2 , D\)) is not closed, and Studenti^Si, N 2 , D\) is not a 
member of both repairs. The answer is “no”, meaning that the query is not true 
in all repairs. □ 

The following example shows that, as opposed to (I|, we are able to treat 
existential queries in a proper way. 

Example 15. Consider the query “ 3xCourse(x,C 2 ,G 2 )l ” for the database in 
example |T1|. Here we have that op(TP(IC Ur)) (g> TP {-3:t Course(x,C 2 ,G 2 )) is 
closed. The second tableau introduces the formulas -1 Course(p, C 2 ,G 2 ), for every 
ceflUPin every branch. The answer is “yes”. This answer has been obtained 
by replacing p by the same constant Si in both branches. This does not need 
to be always the case. For example, with the query “3a: Student(Si, x, Th)?”, 
that introduces the formulas -1 Student (<Si,p, Di) in every branch of op( TP(IC U 
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r)) (8) TP(-i3x Student(Si,x,Di)), the tableau closes, the answer is “y es ”> but 
one repair has been closed for p = Ni and the other repair has been closed for 

P = n 2 . 

We can also handle open existential queries. Consider now the query with y 
as the free variable “3zCourse(Si,y, z)?” . The tableaux for op( TP(IC U r)) (g> 
TP(-i3zCourse(S\, y, z)), which introduces the formulas -1 Course(Si,y,p ) in 
every branch, is closed, actually by y = C\, and also by y = C 2 , but for two 
different values for p. namely G\ and G 2 , resp. □ 

Theorem 4. Let r be an inconsistent database wrt to the set of integrity con¬ 
straints IC. 

1. Let Q(x) be an open query with the free variables x. A ground tuple t is a 
consistent answer to Q(x) iff op(TP(IC Ur)) <g) TP(-iQ(x)) is closed for the 
substitution x >—> t. 

2. Let Q be query without free variables. The answer is “yes”, meaning that 
the query is true in all repairs, iff op(TP(IC U r)) (8 TP(^Q) is closed. 

6 CQA, Minimal Entailment and Tableaux 

As the following example shows, CQA is a form of non-monotonic entailment , 
i.e. given a relational database instance r, a set of ICs IC, and a consistent 
answer P(a) wrt IC, i.e. r \= c ip, it may be the case that r' ty= c P(d ), for an 
instance r' that extends r. 

Example 16. The database containing the table 


Employee 

Name 

Salary 


J.Page 

5000 


V. Smith 

3000 


M. Stowe 

7000 


is consistent wrt the FD fi : Name —► Salary. In consequence, the set of consis¬ 
tent answers to the query Q(x, y) : Employee{x,y) is {(J.Page, 5000), ( V.Smith, 
3000), (M.Stowe, 7000)}. If we add the tuple (J.Page, 8000) to the database, 
the set of consistent answers to the same query is reduced to {( V.Smith, 3000), 
(M.Stowe, 7000)}. □ 


We may be interested in having a logical specification Spec r of the repairs of 
the database instance r. In this case, we could consistently answer a query Q(x), 
by asking for those t such that 

Spec r |=s Q(t) = r\= c Q(t), (1) 

where |« is a new, suitable consequence relation, that, as the example shows, 
has to be non-monotonic. 



Database Repairs and Analytic Tableaux 


21 


6.1 A circumscriptive characterization of CQA 

Notice that with CQA we have a minimal entailment relation in the sense that 
consistent answer are true of certain minimal models, those that minimally differ 
from the original instance. This is a more general reason for obtaining a non¬ 
monotonic consequence relation. Actually, the database repairs can be specified 
by means of a circumscription axiom | j28|]2(| that has the effect of minimizing 
the set of changes to the original database performed in order to satisfy the ICs. 

Let Pi,..., P n be the database predicates in C. In the original instance r, 
each Pi has a finite extension that we also denote by Pi. Let R\,... ,R n be 
new copies of P\,..., Pi, standing for the corresponding tables in the database 
repairs. Define, for * = 1,..., n, 


vx[pr(x) def 

*• {Ri{x) A ->Pi(x))], 

(2) 

Vx[P° Ut {x) de/— 

-> {Pi{x) A -iPj(i))]. 

(3) 


Consider now the theory £ consisting of axioms ©, © plus r, i.e. the (finite) 
conjunction of the atoms in the database, plus /C(Pi/Pi, • • •, P n /R „), i.e. the set 
of ICs, but with the original database predicates replaced by the new predicates; 
and possibly, axioms for the built-in predicates, e.g. equality. 

In order to minimize the set of changes, we circumscribe in parallel the pred¬ 
icates P™, P° ut in the theory £, with variable predicates Pi,..., R n , and fixed 
predicates P\,...,P n ©|, that is, we consider the following circumscription 

Circum(£- P ™,... P ° ut ; Pi,.. •, R n ; Pi, ■ ■ ■, P n ). (4) 

The semi-colons separate the theory, the predicates minimized in parallel, the 
variable predicates and the fixed predicate, in that order. 

We want to minimize the differences between a database repair and the orig¬ 
inal database instance. For this reason we need the Ri to be flexible in the 
minimization process. The original predicates P^s are not subject to changes, 
because the changes can be read from the P, (or from their differences with the 

Pi). 

Example 17. Consider r = {P(a)| and IC = \\/x{P{x) —> Q(x))}. In this case, 
£ consists of the following sentences: P(a),Vx(Rp(x) —> Rq(x)), Mx(P ln {x) <-> 
Rp(x) R~iP(x)), Vx(P out (x) <-> P(x) A -iRp(x)), Wx(Q m (x) <-> Rq(x) A -iQ(x)), 
Vx(Q out (x) <-> Q(x) A -iRq(x)). Here the new database predicates are Rp and 
Rq. They vary when p m ^p out ^ Q m , Q out are minimized. 

The models of th circumscription are the minimal (classical) models of the 
theory £. A model 9Jt =< M, (P in ) M , (?“*)", (Q in ) M , (Q out ) M , R^, R%, P M , 
Q M , a M > is minimal if there is no other model with the same domain M that 
interprets P, Q , a in the same way as and has at least one of the interpretations 
of P m , P out i Q ln , Q out strictly included in the corresponding in 971 and the others 
(not necessarily strictly) included in the corresponding in 971. □ 
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Circumscription (||) can be specified by means of a second-order axiom 



( 5 ) 


The first conjunct emphasizes the fact that the theory is expressed in terms of 
the predicates shown there. Those predicates are replaced by second-order vari¬ 
ables in the £ in the quantified part of the formula. The circumscription axiom 
says that the change predicates P“, R° ut have the minimal extension under set 
inclusion among those that satisfy the ICs. It is straightforward to prove that 
the database repairs are in one to one correspondence with the restrictions to 
Pi,..., R n of those Herbrand models of the circumscription that have domain 
D and the extensions of the predicates P±,..., P n as in the original instance r. 

An alternative to externally fixing the domain D consists in minimizing the 
finite active domain, that is a subset of D. This can be achieved by means 
of a circumscription as well, and then that domain can be extended to the 
whole of D. Notice that in order to capture the unique names assumption of 
databases, the equality predicate could be minimized. Furthermore, if we want 
the minimal models to have the extensions for the Pi as in r, we can either 
include in £ predicate closure axioms of the form Vx(P;(x) <-> Vi' % = <%) if 
Pi’s extension is non-empty and Vx(P*(x)) <-> x ^ x) if it is empty; or apply to 
those predicates the closed world assumption, that can also be captured by means 
of circumscription. See p 6 f for details. Another alternative is to fix the domain 
D and replace everywhere r in £ by the first-order sentence, cr(r), corresponding 
to Reiter’s logical reconstruction of database instance r Q. We do not do any 
of this explicitly, but leave it as something to be captured at the implementation 
level. 

Example 18. (example |T^ continued) The minimal model of the circumscription 
of the theory are < D , 0, {a}, 0, 0, 0, 0, {a}, 0 > and < D, 0, 0, {a}, 0, {a}, {a}, 
{a},0 >, that show first the domain and next the extensions of P m ,P out ,Q m , 
Q out , Rp , Rq,P , Q >, in this order. The first model corresponds to repairing the 
database by deleting P(a); the second, to inserting Q(a). □ 

By playing with different kinds of circumscription, e.g. introducing priorities 
[ p5| , or considering only some change predicates, e.g. only P° ut, s (only deletions), 
preferences for some particular kinds of database repairs could be captured. We 
do not explore here this direction any further. 

The original theory £ can be written as £' A r, where £' is formed by 
all the conjunctions in £, except for r. It is easy to see that the circumscrip¬ 
tion Circum(£; R™,... R° ut ; Ri,..., R n ; Pi, ■ ■ ■, P n ) is logically equivalent to 
r A Circum{£'-, P ™,... P°“ 4 ; Pi,..., P„; Pi,..., P n ). In consequence, we can re¬ 
place (0) by 


r A Circum{£'\ R 


.. R° n ut - Pi,..., R n ', Pi,-.-, Pn) b Q{t) = 

( 6 ) 
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We can see that in this case the nonmonotonic consequence relation [« corre¬ 
sponds then to classical logical consequence, but with the original data put in 
conjunction with a second-order theory. 

Some work has been done on detecting conditions and developing algorithms 
for the collapse of a (second-order) circumscription to a first-order theory . 

The same for collapsing circumscription to logic programs (2(J . In our case, this 
would not be surprising. In direct specifications of database repairs by 

means of logic programs are presented. 

In our case, there is not much hope in having the circumscription collapse 
to a first-order sentence, (fare■ If this were the case, CQA would be feasible 
in polynomial time in the size of the database, because then for a query Q, 
the query (tpcirc — 1 • Q) could be posed to the original instance r. As shown in 
[ PI , CQA can be coNP-complete, even with simple functional dependencies and 
(existentially quantified) conjunctive queries. Actually, in the general case CQA 
is indecidable (to appear in an extended version of [Q ). 

Under those circumstances, it seems a natural idea to explore to what extent 
semantic tableaux can be used for CQA. Actually, some implementations to 
nomonotonic reasoning, more precisely to minimal entailment, based on semantic 
tableaux have been proposed in |pl 
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6.2 Towards implementation 

The most interesting proposal for implementing first order circumscriptive rea¬ 
soning with semantic tableaux is offered by Niemela in |30| , where optimized 
techniques for developing tableaux branches and checking their minimality are 
introduced. The techniques presented there, that allow minimized, variable and 
fixed predicates, could be applied in our context, either directly, appealing to the 
circumscriptive characterization of CQA we gave before, or adapting Niemela’s 
techniques to the particular kind of process we have at hand, in terms of mini¬ 
mal opening of branches in the tableau TP(IC Ur)^\ We will briefly explore this 
second alternative. 

As in po|, we assume in this section that (a) the semantic tableaux are 
applied to formulas in clausal form, and (b) only Herbrand models are considered, 
what in our case represents no limitation, because our openings, repairs, etc. are 
all Herbrand structures. Furthermore, if IC contains safe formulas |3^], what 
is commonly required in database applications, we can restrict the Herbrand 
domain to be the finite active domain of the database. 

As seen in section^, consistently answering query Q from instance r wrt IC, 
can be based on the combination of op( TP{IC U r)) and TP(^Q(x). Neverthe¬ 
less, explicitly having the first, pruned, tableau amounts to having also explicitly 
all possible repairs of the original database. Moreover, this requires having veri¬ 
fied the property of minimality in the data closed branches, possibly comparing 

8 Notice that the input theory in this case differs from the theory to which the cir¬ 
cumscription is applied in the previous section. 
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different branches wrt to inclusion. It is more appealing to check minimality as 
the tableau TP(IC Ur) is developed. 

Notice that if a finished branch B £ TP{IC U r), opened after a prelimi¬ 
nary data closure was reached, remains open for x = t when combined with 
TP(-iQ(x)), then op(B) is a model of IC and -> Q(t), and in consequence op(B ) 
provides a counterexample to IC \= Q(i). However, this is classical entailment, 
and we are interested in those models of IC that minimally differ from r, in con¬ 
sequence, op(B) may not be a counterexample for our problem of CQA, because 
the it may not correspond to a repair of the original instance. Such branches 
that would lead to a non minimal opening in TP(IC U r) should be closed, and 
left closed exactly as those branches that were closed due to built-ins. 

As we can see, what is needed is a methodology for developing the tableaux 
such that: (a) Each potential counterexample is explored, and hopefully at most 
once, (b) Being a non minimal opening is treated as a closure condition (because, 
as we just saw, they do not provide appropriate counterexamples), (c) The min¬ 
imality condition is checked locally, without comparison with other branches, 
what is much more efficient in terms of space. 

Such methodology is proposed in Q , with two classical rules for generating 
tableaux, a kind of hyper-type rule, and a kind of cut rule. The closure conditions 
are as in the classical case, but a new closure condition is added, to close branches 
that do not lead to minimal models. This is achieved by means of a “local” 
minimality test, that can also be found in [ p9|Jl7)| . We can adapt and adopt such 
a test in our framework on the basis of the definition of grounded model given 
in |j(J and our circumscriptive characterization of CQA given above. 

Let B be a data closed branch in TP(IC U r), with op(B ) = (r \ L) U K. 
We associate to B a Herbrand structure M(B) over the first order language 
C(K, L , P, R ), where R =< Ri,.. ., R n > is the list of original database pred¬ 
icates, P =< Pi,...,P n > is the list of predicates for the repaired versions of 
the RiS, L =< Li,... ,L n >,K =< A'i,..., K n > are predicates for Ri \ Pi 
and Pi \ Ri , resp. (Then it makes sense to identify the list of predicates L 
and K with the sets of differences K and L in the branch B). M(B) =< 
Act(r),L B ,K B ,P B : R B > is defined through (and can be identified with) the 
subset A := (J" Lf U (J) 1 Kf U (J" P B U (J" Rf of the Herbrand base B, where 
(J" Rf coincides with the database contents r, and the elements in (J" P B are 
taken from op(B). 

Now we can reformulate for our context the notion of grounded Herbrand 
structure given in Q. 

Definition 7. (adapted from j3(J) An opening op(B ) is grounded iff for all p € 
K U L with p(t) € A it holds 

n n 

IC{P 1 /R ll ...,P n /R n ) U {/\{Li = Ri\ Pi). f\ Kt = Pi \ Ri)} (7) 

i i 

U (A) \= p(t ), 

where N <L ’ R ’ R> (A) := {-.«?(?) q e_L U K U R and q(t) £ B \ A} U 

{q(t) | q £ R and q[t) £ A}. 


□ 
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Notice that the first set in the union that defines N <L,K ' ,R> (A) corresponds 
to the CWA applied to the minimized predicates, i.e. those in Z, K , and the fixed 
predicates, i.e. those in R. The second set coincides with the original database 
contents r. From the results in JdOt and our circumscriptive characterization of 
CQA, we obtain the following theorem. 

Theorem 5. An opening op(B) corresponds to a database repair iff M(B) is a 
grounded model of ( 0 ). □ 

Ungrounded models can be discarded, and then ungroundedness can be used 
as an additional closure condition on branches. Notice that the test is local to 
a branch and can be applied at any stage of the development of a branch, even 
when it is not finished yet. The test is based on classical logical consequence, 
and then not on any kind of minimal entailment. 

Example 19. (example [Tl] continued) We need some extra predicates. Pp,Pq, P r 
stand for the repaired versions of P, Q , R , resp. Ap, Lq , Pp, Kp , Kq , I\ R stand 
for P \ Pp, ...,P r \R l resp. Here Z =< L P , Lq,L r >, K =< I< P , Kq,K r > 

, P =< Pp, Pq,Pr >,R =< P, Q , R >■ 

In order to check groundedness for branches, we have the underlying theory 
E = {Vx(Pp( x) —> Pq(x)),Mx(Lp(x) (P(x) A -<P P (x ))),... ,Wx(K R (x) 
(P R (x) A -ii?(x)))}, corresponding to (0). 

In order to check the minimality of branch Pi, we consider M(Pi), that 
is determined by the set of ground atoms T(Pi) = {P(a),R(b),L P (a),R R (b)}. 
First, this structure satisfies E. Now, for this branch 

N< L ’ R ' r >(A(B 1 )) = {-i. Lp(b ), -i. Lq{o ), -'Zq( 6 ), -iPp(a), -^L R (b), -> K P (a), 
~‘K P (b), -iKq(o), -iP'p(a), -iK R (b), ~^P(b), 

~'Q{a),-‘Q(b),-‘R(a)} U {P(a),R(b)}. 

For groundedness, we have to check if L P (a) is a classical logical consequence of 
E U N <L ’ K ’ R> (A(Bi)). This is true, because, from -*Kq(cl), we obtain -i Pq(o ). 
Using the contrapositive of the IC in E, we obtain, -iPp(a). 

In consequence, the opening corresponding to branch Pi is a repair of the 
original database. 

Consider now the unfinished branch P 3 , for which A{B^) = {P(a), R(b), 
K Q (b), R P (a), R Q (b), R R (b)}, and 

N <L ’ r ’ r> (A(B 3 )) = (-iPp(a), -iPp(6), -iPg(a), -iLq(6 ), ->L R {a), ->L R (b), 
-iA'p(a), -iAp(6), -iATg(a), -> K R (a), -> K R (b), ->P(b), 
~ n Q(a),^Q{b),^R(a)} U {P(a),R(b)}. 

We have to apply the groundedness test to A'q( 6 ). In this case it is not possible 
to derive this atom from E U N <L ’ K ’ R> (A(B 3 )), meaning that the set of literal 
is not grounded. If we keep developing that branch, the set N can only shrink. 
In consequence, we will not derive the atom in the extensions. We can stop 
developing branch P 3 because we will not get a minimal opening. □ 
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7 Conclusions 

We have presented the theoretical basis for a treatment of consistent query an¬ 
swering in relational databases by means of analytic tableaux. We have mainly 
concentrated on the interaction of the database instance and the integrity con¬ 
straints; and in the problem of representing database repairs by means of opened 
tableaux. However, we also showed how the analytic tableaux methodology could 
we also used for consistent query answering. 

We established the connections between the problem of consistent query an¬ 
swering and knowledge base update, on one side, and circumscriptive reasoning, 
on the other. This is not surprising, since the relationship between knowledge 
base update and circumscription has already been studied by Winslett 
(see also ID)- 

The connection of CQA to updates and minimal entailment allowed us to 
apply know complexity results to our scenario. Furthermore, we have seen that 
the reformulation of the problem of CQA as one of computing circumscription 
opens the possibility of applying established methodologies for semantic tableaux 
based methodologies for circumscriptive reasoning. 

As we have seen, there are several similarities between our approach to con¬ 
sistency handling and those followed by the belief revision/update community. 
Database repairs coincide with revised models defined by Winslett in |h]]. The 
treatment in f39| is mainly propositional, but a preliminary extension to first 
order knowledge bases can be found in 0- Those papers concentrate on the 
computation of the models of the revised theory, i.e., the repairs in our case, but 
not on query answering. Comparing our framework with that of belief revision, 
we have an empty domain theory, one model: the database instance, and a re¬ 
vision by a set of ICs. The revision of a database instance by the ICs produces 
new database instances, the repairs of the original database. 

Nevertheless, our motivation and starting point are quite different from those 
of belief revision. We are not interested in computing the repairs per se , but in 
answering queries, hopefully using the original database as much as possible, 
possibly posing a modified query. If this is not possible, we look for methodolo¬ 
gies for representing and querying simultaneously and implicitly all the repairs 
of the database. Furthermore, we work in a fully first-order framework. Other 
connections to belief revision/update can be found in jl|]. 

To the best of our knowledge, the first treatment of CQA in databases goes 
back to H. The approach is based on a purely proof-theoretic notion of consistent 
query answer. This notion, described only in the propositional case, is more 
restricted than the one we used in this paper. In (li|, Cholvy presents a general 
logic framework for reasoning about contradictory information which is based on 
an axiomatization in modal propositional logic. Instead, our approach is based 
on classical first order logic. 

Other approaches to consistent query answering based on logic programs 
with stable model semantics were presented in [pl|(],|2^| . They can handle general 
first order queries with universal ICs. 
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There are many open issues. One of them has to do with the possibility of 
obtaining from the tableaux for instances and ICs the right “residues” that can 
be used to rewrite a query as in jl]]. The theoretical basis of CQA proposed in 
[|lj were refined and implemented in (TTJ . Comparisons of the tableaux based 
methodology for CQA and the “rewriting based approach” presented in those 
papers is an open issue. However, query rewriting can not be applied to exis¬ 
tential queries like the one in example ju], whereas the tableaux methodology 
can be used. Perhaps, an appropriate use of tableaux could make possible an 
extension of the rewriting approach to syntactically richer queries and ICs. 

Another interesting open issue has to do with the fact that we have treated 
Skolem parameters as null values. It would be interesting to study the applica¬ 
bility in our scenario of methodologies for query evaluation in databases in the 
presence of null values like the one presented in Q. 

In this paper we have concentrated mostly on the theoretical foundations of 
a methodology based on semantic tableaux for querying inconsistent databases. 
Nevertheless, the methodology for CQA requires further investigation. In this 
context, the most interesting open problems have to do with implementation is¬ 
sues. More specifically, the main challenge consists in developing heuristics and 
mechanisms for using a tableaux theorem prover to generate/store/represent 
TP{IC Ur) in a compact form with the purpose of: (a) applying the database 
assumptions, (b) interacting with a DBMS on request, in particular, without 
replicating the whole database instance at the tableau level, (c) detecting and 
producing the minimal openings (only), (d) using a theorem prover (in combi¬ 
nation with a DBMS) in order to consistently answer queries. 

An important issue in database applications is that usually queries have free 
variables and then answer sets have to be retrieved as a result of the automated 
reasoning process. Notice that once we have op(TP(IC U r)), we need to be 
able to: (a) use it for different queries Q , (b) process the combined tableau 
op(TP(IC U r)) Cg> TP(->Q) in an “reasonable and practical” way. We have seen 
that existing methodologies and algorithms like the one presented in |d(J , can be 
used in this direction. However, producing a working implementation, consider¬ 
ing all kinds of optimizations with respect to representation and development of 
the tableaux, grounding techniques, database/theorem-prover interaction, etc. 
is a major task that deserves separate investigation. 
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Appendix: Proofs 

Proof of Lemma |3] 

We have by Lemma [I] r' Ar = r \ r' and r" Ar = r \r". Then l £ r'Ar iff 
l £ r\r', i.e. I £ r and l ^ r' from which it follows that l £ r and l ^ r". Hence 
l £ r\r" = r"Ar. 


Proof of Lemma |i] 

Let r' be an opening of r. Then r' = (r\L)L>K, where L = {l : l £ r and ->i £ 1} 
and K = {l : l £ I and there is no substitution er such that la £ r}. Let us first 
observe that L n K = 0 since L C r and for l £ K, l £ r. We show that 
rAr' = L U K. Let be x £ rAr'. 

1. Case x £ r and x $ r'. Then x $ K and x ^ (r\L). But from this, we get 
x £ L, hence x £ L U AT 

2. Case x £ r and x £ r', iff x r and ((x £ r and x L) or x £ AT), iff x ^ r 
or x £ K from which it follows x £ K U L. 

On the other hand, let be x £ L U K. Again, we consider two cases: 

1. Case x £ L, then by definition, —ix £ I. Then, x qL r\L and, since I is open, 
x I. From this, we get x $. K and, since r' = (r \ L) U K. x r ', from which 
it follows that x £ rAr'. 

2. Case x £ A', then x £ I and x 0 r. But then x £ r' and therefore x £ rAr'. 

Proof of Proposition [3j 

By Lemma [I], we have rAr\ = L\ U K\ and rAr 2 = L 2 U K^. From ri < r r 2 we 
get then L\ U ATi C L 2 U K 2 . Since Li n Ki = 0, we have Li C A 2 and AT C AT. 

Proof of Theorem ||j 

Let r' be a repair of r. Then r' \= IC and r' £ Mini eqr (ic). Since r' is a model of 
IC, by Theorem [I], r' contains an open branch I of the tableau TP(IC) for IC. 
We have r' = (r\ L) U K and since r' is minimal wrt < r , there is no r" closer 
to r than r'. i.e. there is no r" = (r \ L') U K' such that L' C L and K' C K. 
Hence r' U I is a minimal opening of r U I. 

On the other hand, let I U r' be a minimal opening of I U r in TP{IC U r) 
where I is an open branch of TP(IC). Then, by Definition [| r' = (r \ L) U K 
where L = {l : l £ r and —I £ I and K = {l: l £ I and there is no substitution a 
such that la £ r}. By Lemma |], we have rAr' = L U K. Since I U 7 ’ is a minimal 
opening of I U r', we have by Theorem |j. that there is no r". L" and K" such 
that r" is an opening of r and r" = (r \ L") U K" and L" C L and K" C AT 
By Lemma |], this means that there is no r" such that rAr" C rAr', i. e. r' is a 
minimal element of Mod(IC) wrt the order < r . 


